Search CORE

20 research outputs found

Editorial of the special issue on latest advancements in linguistic linked data

Author: Bosque-Gil Julia
Cimiano Philipp
Dojchinovski Milan
Publication venue: 'IOS Press'
Publication date: 01/01/2022
Field of study

Since the inception of the Open Linguistics Working Group in 2010, there have been numerous efforts in transforming language resources into Linked Data. The research field of Linguistic Linked Data (LLD) has gained in importance, visibility and impact, with the Linguistic Linked Open Data (LLOD) cloud gathering nowadays over 200 resources. With this increasing growth, new challenges have emerged concerning particular domain and task applications, quality dimensions, and linguistic features to take into account. This special issue aims to review and summarize the progress and status of LLD research in recent years, as well as to offer an understanding of the challenges ahead of the field for the years to come. The papers in this issue indicate that there are still aspects to address for a wider community adoption of LLD, as well as a lack of resources for specific tasks and (interdisciplinary) domains. Likewise, the integration of LLD resources into Natural Language Processing (NLP) architectures and the search for long-term infrastructure solutions to host LLD resources continue to be essential points to which to attend in the foreseeable future of the research line

Repositorio Universidad de Zaragoza

Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

Author: Croft W Bruce
Dojchinovski Milan
Evan Sandhaus
Manning Christopher D.
Mihalcea Rada
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/05/2018
Field of study

This paper presents a Kernel Entity Salience Model (KESM) that improves text understanding and retrieval by better estimating entity salience (importance) in documents. KESM represents entities by knowledge enriched distributed representations, models the interactions between entities and words by kernels, and combines the kernel scores to estimate entity salience. The whole model is learned end-to-end using entity salience labels. The salience model also improves ad hoc search accuracy, providing effective ranking features by modeling the salience of query entities in candidate documents. Our experiments on two entity salience corpora and two TREC ad hoc search datasets demonstrate the effectiveness of KESM over frequency-based and feature-based methods. We also provide examples showing how KESM conveys its text understanding ability learned from entity salience to search

arXiv.org e-Print Archive

Crossref

Language resources and linked data: a practical perspective

Author: Baron Ciro
Dojchinovski Milan
Flati Tiziano
Gracia del Río Jorge
McCra John P.
Vila Suero Daniel
Publication venue: E.T.S. de Ingenieros Informáticos (UPM)
Publication date: 01/01/2014
Field of study

Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper

Archivo Digital UPM

2nd Conference on Language, Data and Knowledge (LDK 2019), May 20–23, 2019, Leipzig, Germany

Author: Buitelaar Paul
Chiarcos Christian
de Melo Gerard
Dojchinovski Milan
Eskevich Maria
Fäth Christian
Klimek Bettina
McCrae John P.
Publication venue
Publication date: 27/04/2023
Field of study

OPUS Augsburg

A survey of guidelines and best practices for the generation, interlinking, publication, and validation of linguistic linked data

Author: Chiarcos Christian
Declerck Thierry
Di Buono Maria Pia
Dojchinovski Milan
Gifu Daniela
Gracia Jorge
Khan Fahad
Valunaite Oleskeviciene Giedre
Publication venue
Publication date: 24/04/2023
Field of study

This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey

OPUS Augsburg

A Survey of Guidelines and Best Practices for the Generation, Interlinking, Publication, and Validation of Linguistic Linked Data

Author: Anas Fahad Khan
Christian Chiarcos
Daniela Gifu
di Buono Maria Pia
Giedre Valunaite Oleskeviciene
Jorge Gracia
Milan Dojchinovski
Thierry Declerck
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2022
Field of study

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Knowledge Base Creation, Enrichment and Repair

Author: Dimitris Kontokostas
Jens Lehmann
Lorenz Bühmann
Milan Dojchinovski
Mladen Stanojević
Ondřej Zamazal
Petar Petrovski
Sebastian Hellmann
Uroš Milošević
Vojtěch Svátek
Volha Bryl
Publication venue: Springer International Publishing
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

Cross-Lingual Link Discovery for Under-Resourced Languages

Author: Ahmadi Sina
Apostol Elena-Simona
Bosque-Gil Julia
Chiarcos Christian
Dojchinovski Milan
Gkirtzou Katerina
Gracia Jorge
Gromann Dagmar
Liebeskind Chaya
Rosner Michael
Serasset Gilles
Truica Ciprian-Octavian
Valūnaitė-Oleškevičienė Giedrė
Publication venue
Publication date: 01/01/2022
Field of study

CC BY-NC 4.0In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We define under-resourced languages with a specific focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources

Mykolas Romeris University Institutional Repository

DEVELOPING MASHUP APPLICATIONS USING EMML

Author: Dojchinovski Milan
Publication venue: M. Dojchinovski
Publication date: 16/09/2010
Field of study

V diplomskem delu podrobno predstavimo podjetniške sestavljanke in jezik EMML. Obdelamo arhitekturo podjetniških sestavljank za lažje identificiranje izzivov, ki jih le-te prinašajo, in izpostavimo potrebo po vpeljavi podjetniških sestavljank v podjetjih. Sledi podroben opis jedra jezika EMML kot standarda za razvoj podjetniških sestavljank. Izpostavimo prednosti, ki jih jezik EMML prinaša, ter identificiramo morebitne ovire. Razvoj sestavljank z jezikom EMML prikažemo na praktičnem primeru z izdelavo sestavljank za nadzor mednarodne izmenjave študentov.In this final work we present in details the enterprise mashups and the Enterprise Mashup Markup Language. We go through the enterprise mashups architecture for easier identificaton of the challenges they bring and we stress the need for implementation of enterprise mashups in enterprises. After that follows a detailed description of the core of the EMML as an enterprise mashup development standard. We present the advantages the EMML language brings and we identify possible obstacles. Developing enterprise mashups using EMML is presented on a practical case with development of a mashups for supervision of international student exchanges

Digital library of University of Maribor